YFDM: YOLO for detecting Morse code

With the increasing complexity of the shortwave communication environment, the efficiency and accuracy of the manual detection of Morse code no longer meet actual needs. Therefore, this paper proposes a Morse code detection algorithm called YFDM. For the time–frequency image of the received signal, a combination module of deformable convolution and C3 is used to enhance the backbone network’s attention to the abstract semantics and location information of Morse code. GSConv and VOV-GSCSP modules are used to build a lightweight neck network. Finally, the confidence propagation cluster (CP-Cluster) algorithm is used to filter the detection frame. In an ablation experiment, the parameters and giga floating-point operations per second (GFLOPs) of YFDM were 5.961 M and 9.74 G, respectively, 15.11% and 38.9% less than those of YOLOv5. Moreover, when WIoUv1 was used as the loss function of the bounding box, the AP0.5:0.95 and frames per second (FPS) values of the algorithm reached the highest values, 0.68 and 72.4. The experimental results indicate that the algorithm can effectively reduce the weight of the model while ensuring the detection accuracy and inference speed.

The YFDM (YOLO for Detecting Morse) algorithm model is proposed in this paper to address the aforementioned motivation.Its main work is summarized as follows: (1) The backbone network uses deformable convolution 25 and C3 combination modules to enhance the representation ability of Morse code's geometric characteristics and make up for the defect of fuzzy location information in the deep network.(2) The neck network uses a combination module of GSConv and VOV-GSCSP to simultaneously ensure the detection accuracy and reduce the amount of model parameters and computational complexity 26 .
(3) The confidence propagation cluster 27 algorithm is used to obtain the interaction information between candidate frames and complete the screening to obtain high-quality detection frames.(4) The influence of different IoU bounding-box loss functions on the Morse code detection performance is compared and analyzed, providing reference and data support for IoU loss selection for related tasks.(5) A Morse code time-frequency graph dataset is prepared, imitating a real Morse communication scene.
In the "YFDM detection model" section, the YFDM algorithm model and processing flow are introduced in detail.In the "Experiment" section, we present experimental verification of the proposed method and demonstrate the detection effect.The "Conclusion" section provides the conclusion, which summarizes the main work and experimental conclusions of this paper and puts forward prospects for future work.

Model overview
The overall framework of the YFDM detection model is shown in Fig. 1, which corresponds to the four stages of feature extraction, feature fusion, detection frame reasoning, and post-processing.The parameter settings of each layer of the network model are shown in Table 1.A time-frequency image of a short-wave signal with a bandwidth of 2 MHz is used as the input to the backbone network.The first convolutional layer downsamples the input image, reducing its width and height dimensions to 320 and increasing the number of channels to 64.The DCNv2 module then uses the offsets of the sampling points to modify the convolutional kernel's sensory field and enhance the network's focus on the Morse code region by using the weight coefficients, all the while suppressing the extraction of insufficient information.Three output feature maps with tensor shapes of 80 × 80 × 256, 40 × 40 × 512, and 20 × 20 × 1024 are sent to the neck network's left branch from the backbone network.The up-sampled feature maps are then fused with the coarser-grained feature maps in the backbone network using the concat operation to obtain semantic features at different levels.To aid in the network's improved localization of the Morse code region, the right branch feeds shallow location data into the convolutional layer's finer-grained feature maps.Among these, the newly added GSConv and VoV-GSCSP modules successfully lower the computational complexity and number of model parameters.After receiving the output feature maps at three different scales from the neck network, the detection head network outputs the coordinates of all candidate bounding boxes according to different anchor box sizes, and finally uses the CP-Cluster algorithm to perform the final screening of these candidate boxes.After the processing of the above process, the regions in the time-frequency image of the input signal that may be Morse code will be boxed out, thus realizing the detection of Morse code.The following sections elaborate on the processing flow of each stage of the proposed model.The detection head network outputs the coordinates of all candidate bounding boxes according to various anchor box sizes after receiving the output feature maps at three different scales from the neck network.The CP-Cluster algorithm is then used to carry out the final screening of these candidate boxes.Following the aforementioned procedure, the areas of the input signal's time-frequency image that might contain Morse code will be boxed out, which allows the detection of Morse code.The processing flow of each stage of the proposed model is explained in more detail in the sections that follow.

Backbone network based on DCN and C3 module
The initial input of the detection model is the RGB signal time-frequency map, with a size of 640 × 640.The image needs to be first down-sampled through a convolution layer with a size of 6 × 6, and the feature map after down-sampling is input to the feature extraction backbone network.The network uses multi-layer DCNv2 and C3 combination modules to extract semantic information from feature maps of different scales, and it finally uses the SPPF module to fuse feature maps with different receptive fields.
Since Morse code may cause irregular changes in geometric features owing to frequency offset, code length deviation, or scale scaling, a conventional convolution kernel can only learn geometric feature changes of Morse code by expanding enough training data owing to the lack of an internal mechanism for processing image geometric feature transformation, which is not friendly to the Morse signal detection task under small sample conditions.To solve this problem, DCNv2 uses the offset and weight coefficient of the convolution kernel sampling points to enhance the attention to useful features and suppress the interference of redundant features.Its calculation formula is as follows: where R is the receptive field size of the convolution kernel, w is the weight of each coordinate point, and P 0 is the position where each point in the output feature map y corresponds to the center of the convolution kernel and maps to the input feature map.P n is the relative coordinates of each position of the convolution kernel cor- responding to P 0 , and P n represents the offset of the coordinates of each position.In the model training process, the DCNv2 module promotes the convolutional sampling points to adjust their distribution shape following the geometric changes in Morse code by minimizing the error between the predicted offset and the actual offset, and (1) it improves the convolutional network's attention to the semantic information and location information of Morse code by updating the weight coefficient of the sampling points so that the receptive field of the output feature map is more easily concentrated in the Morse code area.To some extent, the DCNv2 module can compensate for the positioning error caused by the fuzzy position information of Morse code owing to the deepening of the network, and it can help map the features of the deep more accurately to the corresponding position of Morse code in the original image.In addition, the DCNv2 module significantly reduces the computational complexity at the cost of slightly increasing the number of model parameters.C3 divides the output of DCNv2 into two branches for processing.One branch enters BottleNeck to further extract deep abstract features, and the other branch uses a residual structure to transfer shallow details.Its structure is shown in Fig. 2. The C3 module appropriately increases the depth and receptive field of the network, which helps extract more fine-grained features, making Morse-code-related feature semantics more representational.The residual structure also alleviates the gradient extinction caused by the deepening of the network and reduces the computing bottleneck.
For ease of explanation, the combined feature extraction module of DCNv2 and C3 is abbreviated as DC i , where i represents the number of layers of the combined module connected in a series, and 320/2 i represents the size of the i-th layer output feature map after being connected in series.Finally, three new output feature maps (F 2 , F 3 , and F 4 ) can be obtained, where F 4 is the feature map obtained from the output of the four layer combination module after feature fusion through the SPPF layer.

Neck network based on GSConv and VoV-GSCSP
The proposed model uses the structure of the feature pyramid network (FPN) 28 and path aggregation network (PAN) 29 in the neck network to conduct multi-scale fusion of feature maps.The FPN structure can transfer the deep feature information of Morse code to the shallow layer to enhance the multi-scale semantic expression ability, and the PAN structure can transfer the position information in the shallow feature to the deep layer to enhance its positioning ability.To reduce the amount of model parameters and computational complexity, we employ a lightweight convolution module named GSConv in this structure, as shown in Fig. 3.
GSConv uses a shuffle strategy to infiltrate the generated information of standard convolution into the generated information of depth-wise separable convolution (DSC) 30 ; to some extent, it retains the hidden connection between channels, solves the defect that DSC cannot interact with local feature information between different channels, and enhances the nonlinear expression ability of the network.The calculation complexity comparison between GSConv and Standard Convolution (SC) is shown in Eq. ( 3): where W is the width of the output feature map, H is the height of the output feature map, K 1 and K 2 are the sizes of the convolutional kernel, C 1 is the number of channels in the convolutional kernel, and C 2 is the number of channels in the output feature map.When C 1 is larger, the computational complexity of GSConv is closer to 50% of SC.Since the feature map reaching the neck network has a larger channel size and a smaller height and width (2) size, and most of the redundant information has been filtered, GSConv can maximize its lightweight advantages.
Although GSConv is also used for lightweight backbone networks, the premature stacking of GSConv will significantly increase the network depth, resulting in large positioning errors in the characteristic map of the incoming neck network.Therefore, this paper only uses the GSConv module in the neck network.
VOV-GSCSP features a GSConv-based cross-phase local network module through the single aggregation method, as shown in Fig. 4.
Its role is consistent with C3, but it has less computational complexity.In the BottleNeck section, VOV-GSCSP uses GSConv to replace the SC of C3 and places a convolution layer on the shortcut path, which enriches the gradient combination information and helps overcome the problem that the model's positioning ability declines owing to network deepening.For convenience of explanation, GSConv, VOV-GSCSP, and Upsample modules are abbreviated as GSC, VoV and Up, respectively, and the output characteristic diagram after each Concat can be obtained as in Eq. ( 4).

Post-processing of the detection bounding box based on CP-Cluster
After receiving the output from the neck network, the detection head passes through a convolution layer to obtain three feature maps (H 1 , H 2, H 3 ), as shown in Eq. ( 5).
Each grid of the detection head has 18 channels.The channel dimensions are divided into three groups of prior anchor frames corresponding to different sizes in order.The six channel information pieces of each group correspond to four relative coordinates (b x , b y , b w , b h ) of the prediction frame, the confidence of the prediction frame, and the number to be classified.When the coordinates of the center point of the prediction bounding box are very close to the upper-left and lower-right corners of the grid in which it is located, its offset is functionally processed and needs to be taken at negative and positive infinity, and such extreme values cannot be reached in general.In addition, too large prediction box width and height are prone to gradient explosion in the network.Therefore the sensitivity of the grid is eliminated through Eq. ( 6): where b x and b y represent the coordinates of the center point of the prediction box, t x and t y denote the offset of the center point of the prediction box from the top-left vertex of the grid in which it is located, c x and c y are the horizontal and vertical distances of the top-left vertices of the grid in which the center point of the prediction box is located from the top-left vertices of the feature map, which makes the σ function values easier to take to 0 or 1 by limiting the offset range to − 0.5 to 1.5.b w and b h represent the width and height of the final prediction box, P w and P h denote the width and height of the a priori anchor box, t w and t h denote the offset of the width and height of the prediction box, which avoids the overgrowth of the prediction box by restricting the expansion factor to 0 to 4.
After the detection model generates all prediction boxes, it will finally start to filter them.To use the association information between different candidate boxes to obtain a better prediction box, the CP-Cluster algorithm adopts a completely different idea from the NMS 31 algorithm, and its algorithm steps are shown in Table 2.
Among them, B = {b 1 , b 2 , . ..} represents the set of original bounding boxes, and θ is the pre-set IoU thresh- old.If IoU(b i , b j ) > θ , then an undirected edge is used to connect b i and b j , generating a set of Markov random graphs ( G = {g 1 , g 2 , . ..} ).For each g i ∈ G , ε gi is defined as its set of edges and v gi as defined as its set of points.When b i ∈ v gn , its adjacent point set N bi will contain all nodes connected to b i in g n .Equation ( 7) represents the optimization objective of CP-Cluster: www.nature.com/scientificreports/Among them, P(b i ) represents the confidence obtained by b i after inputting information from adjacent bound- ing boxes and its own bounding boxes into the model.B p represents the set of true candidate boxes with the maximum overlap with the real box, and B n represents a set of redundant bounding boxes.b i represents the detection confidence of the detector towards b i .Equation ( 8) provides the evaluation criteria for the weak friend set ( W b i ) and strong friend set ( S b i ).Equations ( 9) and (10) give the generating functions F gp and F gn of the posi- tive and negative messages, respectively.

VoV-GSCSP BottleNeck
In the positive message-generation formula, the number of weak friends of bounding box b i and the high confidence of weak friends are two key factors to enhance b i .Here, Q represents the number of weak friends, and 1 − P(b i ) guarantees that the confidence of b i after being enhanced is less than 1.Positive messages make it more likely that the real prediction frame will be generated from the set of candidate frames with high confidence around Morse code, reducing the interference of irrelevant signals.In the negative message-generation equation, τ (b j , b i ) defines the negative impact factor of negative information, which is used to select strong friend boxes that suppress b i .When α = 1, the box with the highest confidence in the strong friend set is used; when α = 0, the box with the most overlap with b i is used.When the bounding box on one side reaches the number of inhibition times, the bounding box with the largest negative influence factor is selected to punish b i .The algorithm eventually outputs all prediction boxes with the same confidence, and only the prediction box with the longest horizontal dimension is reserved here to ensure the complete detection of Morse code.Finally, the Morse code detection results can be obtained by mapping the filtered prediction box information to the original time-frequency diagram according to different detection heads.

Experiment Datasets
At present, there is no open manual Morse newspaper general dataset in the Morse signal detection task, so we generated a broadband signal candidate set containing multiple Morse signals with different frequencies through simulation and other methods, and we then converted the candidate set into time-frequency images through short-time Fourier transform.The quality of shortwave communication is easily affected by ionospheric changes, and unstable channels may also cause frequency drift of Morse signals.Generally, Morse's communication frequency band contains various modulation signals.We selected representative FSK, PSK, and AM modulation methods to modulate baseband signal sequences or actually collected human voice and music signals to the vicinity of Morse's signals, and we enriched the background independent signals based on white noise interference.The dataset was also used to simulate frequency drift and code rate fluctuation, and we added -5 to 0 dB random white noise to the background.The training set contained 2072 pictures in total, and the verification set contained 350 pictures.Each time-frequency diagram contained at least one Morse signal.The center frequency of the Morse signal was within 5 ~ 12 MHz, the frequency bandwidth was 2 MHz, and the duration of different signals was within 3 ~ 8 s. Figure 5 shows an example of the datasets.( 7) www.nature.com/scientificreports/

Experimental setup and data enhancement
We used a Windows 10 system, and the program code compilation and running environment was Python 3.8, pythoch1.10.0, and cuda11.3.All models were run on NVIDIA RTX3080Ti GPU.The specific parameter settings of the experiment are shown in Table 3.
To enhance the generalization of the detection model and avoid the decline in model detection performance due to the difference in color characteristics between the training set and the test set, we randomly selected 90% of the training time-frequency images for graying processing before each round of training, making the detection model more inclined to learn the geometric characteristics of Morse code than the color characteristics.To enhance the robustness of the model, the mosaic strategy was used to enhance the data; In addition, we pre-clustered a suitable priori anchor box according to the real boxes in the dataset.The specific size is shown in Table 4.

Performance evaluation index
The indicators to measure the performance of the detection model in this paper were Precision, Recall, AP0.5, AP0.5:0.95,F2 score, Parameters, giga floating-point operations per second (GFLOPs), and frames per second  www.nature.com/scientificreports/(FPS).Precision and Recall represent the precision and recall of the detection model, respectively.Because there is a negative correlation between the two, the value of the area under the curve was used to evaluate the overall detection quality of the model.The setting of IoU threshold also affected the performance of AP.The higher the IoU threshold, the stricter the evaluation criteria.Considering that the purpose of Morse code detection is to provide a data basis for later decoding work, the missing detection of Morse signal is more serious than error detection, so the F2 score was selected to increase the influence weight of Recall in performance evaluation, and its calculation formula is as shown in Eq. (11).Parameters and GFLOPs were used to measure the parameters and computational complexity of the model.FPS was used to measure the overall processing speed of the model, and the total time spent from inputting signal data into the model to outputting the detection results was the total time spent.

Model selection
The experiments used the representative Faster-RCNN, SSD, and YOLOv8s in the target detection algorithm to detect multiple Morse codes.After 100 cycles of training, the performance evaluation of each detection model was done, and the results are shown in Table 5.Among them, the AP0.5 of SSD was only 0.936 and was prone to incomplete detection.Although Faster-RCNN had good detection performance, its processing speed and memory usage could not meet the practical requirements.YOLOv8s and YOLOv5s detectors gave better consideration to detection accuracy and reasoning speed.The AP0.5:0.95 of YOLOv8s was the highest, 0.673, but YOLOv5s performed better when IoU = 0.5.In addition, the results of 10 control experiments show that the FPS of YOLOv5s was generally approximately 9% higher than that of YOLOv8s, and that YOLOv5s had fewer parameters and less computational complexity.Therefore, we chose to design our Morse code detection algorithm based on YOLOv5s.

Ablation test
To verify the function of each module in YFDM, we conducted four groups of experiments to ablate each module.The first group was the original YOLOv5s; the second group used the DCN and C3 modules to reconstruct the backbone network of YOLOv5s; the third group used the GSConv and VOV-GSCSP modules to reconstruct the neck network of YOLOv5s; and the fourth group combined the second group and the third group, namely the YFDM detection model.Under the same training conditions, the experiment was repeated 10 times for each group.The experimental results are shown in Tables 6 and 7. Compared with the first group of experiments, the AP0.5:0.95 of the second group increased by 0.5%, the F2 score increased by 0.5%, and GFLOPs decreased by 19.6%.Although the DCNv2 module slightly increased the number of model parameters, it improved the detection accuracy and reduced the computational complexity.In the third group, the number of model parameters and GFLOPs decreased by 16.8% and 19.7%, respectively.Although AP0.5:0.95 showed slight losses, compared with the optimization effect of model parameters and computational complexity, these losses were acceptable.The fourth group of experimental results show that the proposed model obtained the highest F2 score and AP0.5:0.95, and that its model parameters and GFLOPS were only 84.9% and 61.1% of those of YOLOv5s.
Table 8 compares the computational complexity and number of model parameters between the suggested model and different YOLO versions.The data indicates that compared to other YOLO variants in the same class, YOLOv5s and YOLOv8s offer greater lightweight benefits.In addition, the improved YFDM model using YOLOv5s as the base model in the experiments in "Model selection" section has fewer model parameters and ( 11) www.nature.com/scientificreports/lower computational complexity, which provides favorable conditions for the deployment of the algorithm on the mobile device side.This is because there is no discernible difference in the detection performance between the two in the Morse code detection task.

Selection of boundary box regression loss function
To test the impact of different IoU loss on the detection performance, we selected CIoU loss, DIoU loss 32 , EIoU loss 33 , SIoU loss 34 , and WIoU loss 35 to train the proposed model.Table 9 shows the test results.CIoU loss and DIoU loss had the highest F2 score, followed by WIoUv1 loss, and SIoU loss had the lowest score.The highest AP0.5:0.95 of 0.673 was obtained for WIoUv1 Loss, followed by 0.67 for CIoU loss, and the AP0.5:0.95 of other IoU loss was lower than 0.67.DIoU loss had the highest FPS, WIoUv1 loss was second only to DIoU loss, and SIoU loss and EIoU loss had relatively close FPS values.CIoU loss had the lowest FPS, 102.044.The DIoU loss uses the Euclidean distance of the center points of the two frames for regression, and it had the highest FPS.However, owing to the lack of specific coordinate information of Morse code, even though the prior frame size had been determined, the AP0.5:0.95 of the DIoU loss was still the lowest, 0.663.CIoU loss considers the overlapping area, center point distance, and aspect ratio of prediction frames.EIoU loss calculates the length and width loss of real frames and prediction frames on the basis of CIoU loss.Because the inverse trigonometric function calculation involved in CIoU loss affects the reasoning speed, the FPS of EIoU loss was slightly faster, but the detection accuracy and F2 score of the two were not significantly different.SIoU loss takes the vector  www.nature.com/scientificreports/angle between two frames into account, and its precision was the highest, 0.994, but the F2 score was the lowest, 0.9876, so there was a relatively high probability of missing detection.WIOUv1 loss constructs attention-based bounding-box loss, which can significantly weaken the punishment of geometric metrics for low-quality examples.The proposed model achieved the highest AP0.5:0.95 and a relatively high FPS.

Impact of CP-Cluster on detection performance
In this experiment, the training model in "Selection of boundary box regression loss function" section was used for the reasoning prediction box, and NMS and CP-Cluster algorithms were used for candidate box screening.The processing results of the two algorithms are shown in Figs. 6 and 7.
It can be seen from the figure that the AP0.5:0.95 of the CP-Cluster method was significantly improved compared with the NMS method.When WIoUv1 was selected as the loss function of the bounding box, the AP0.5:0.95 of this model reached 0.68, which was 2.26% higher than the first group of models in the ablation experiment in "Ablation test" section.The CP-Cluster method requires a certain amount of computing power for message propagation, which increases the overall time of model output results.However, its FPS could still reach 71.76, which can meet the real-time requirements of Morse code detection.

Conclusion
Based on the embeddability of Morse code detection algorithm, we designed a Morse code detection algorithm called YFDM combining the advantages of DCNV2, GSConv, and VOV-GSCSP modules.In the training phase, we analyzed the impact of different IoU loss functions on detection performance, and we found that when WIoUv1 was used as the loss function of the bounding box, the training model could achieve the highest AP0.5:0.95 and a relatively fast processing speed.In the candidate box processing phase, although the introduction of CP-Cluster method increased the overall time consumption, it significantly improved the detection quality.Finally, the YFDM algorithm effectively reduced the number of parameters and the computational complexity of the algorithm model on the premise of ensuring sufficient detection accuracy and low processing delay.

Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

Figure 1 .
Figure 1.Schematic diagram of the YFDM model frame.

Figure 5 .
Figure 5. Morse code detection dataset.(a) General case, (b) presence of frequency offset, (c) presence of code speed deviation, (d) at low SNR, (e) frequency close to each other, (f) overlap with other signals.

Figure 8
shows the detection effects of the proposed model and other models.The proposed model is trained by utilizing WIoUv1 as a loss function and, during the post-processing phase, employing the CP-Cluster technique to filter the anticipated bounding boxes.When the SSD detector was used, one of the Morse code was not completely detected.The detection results of Faster-RCNN and YOLOv5s are closer to each other, but Faster-RCNN had difficulty meeting the real-time requirements of Morse code detection and required a large amount of memory space of the hardware device.The proposed model had a similar detection effect as YOLOv8s, and their detection frames could fit the Morse message region more accurately, but the proposed model has the advantages of a light weight and processing delay, and it is more suitable for Morse code detection task.

Figure 8 .
Figure 8. Results of different detection models.

Table 1 .
Parameter setting and output feature drawing size of each layer (settings: [out channels, kernel size, stride, padding]; feature maps: [w, h, out channels]).

Table 2 .
Algorithm steps for CP-Cluster.

Table 5 .
Comparison of different models.

Table 6 .
Results of ablation experiment.

Table 7 .
Parameters and GFLOPs of the proposed model (+ / − represents the change compared with YOLOv5s).

Table 8 .
Comparison of parameters and GFLOPs for different YOLO versions.

Table 9 .
Impact of different IoUs on detection accuracy.